A Hybrid Hardware/Software Generated Prefetching Thread Mechanism on Chip Multiprocessors

نویسندگان

  • Hou Rui
  • Longbing Zhang
  • Weiwu Hu
چکیده

This paper proposes a hybrid hardware/software generated prefetching thread mechanism on Chip Multiprocessors(CMP). Two kinds of prefetching threads appear in our hybrid mechanism. Most threads belong to Dynamic Prefetching Thread, which are automatically generated, triggered, spawn and managed by hardware; The others are of Static Prefetching Thread, targeting at the critical delinquent loads which can not be accurately or timely predicted by Dynamic Prefetching Thread. Static Prefetching Threads are statically generated by binary-level optimization tool with the guide of profiling information. Also, some aggressive thread construction policies are proposed. Furthermore, the necessary hardware infrastructure for CMP supporting this hybrid mechanism are described. For a set of memory limited benchmarks with complicated access patterns, an average speedup of 3.1% is achieved on dual-core CMP when constructing basic hardware-generated prefetching thread, and this gain grows to 31% when adopting our hybrid mechanism.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerating sequential programs on Chip Multiprocessors via Dynamic Prefetching Thread

A Dynamic Prefetching Thread scheme is proposed in this paper to accelerate sequential programs on Chip Multiprocessors. This scheme belongs to the hardware-generated thread-based prefetching technique and can decouple the performance and correctness to some extent. This paper describes the necessary hardware infrastructure supporting Dynamic Prefetching Thread on traditional Chip Multiprocesso...

متن کامل

Effective Instruction Prefetching In Chip Multiprocessors

threaded application performance, often achieved through instruction level parallelism per chip is increasing, the software and hardware techniques to exploit the potential of studies mostly involve distributed shared memory multiprocessors and fetching will not be fully effective at masking the remote fetch latency. the effective address of the load instructions along that path based upon a hi...

متن کامل

A Preliminary Evaluation of Cache-miss-initiated Prefetching Techniques in Scalable Multiprocessors

Prefetching is an important technique for reducing the average latency of memory accesses in scalable cache-coherent multiprocessors. Aggressive prefetching can signiicantly reduce the number of cache misses, but may introduce bursty network and memory traac, and increase data sharing and cache pollution. Given that we anticipate enormous increases in both network bandwidth and latency, we exam...

متن کامل

Library-based Prefetching for Pointer-intensive Applications

Processor speed has been improving faster than memory latency for over two decades. Thus, an increased portion of execution time is spent stalling on loads, waiting for data from the memory hierarchy. Prefetching is an effective mechanism to hide memory latency for applications with low temporal locality. However, existing hardware prefetching techniques work well for array-based programs but n...

متن کامل

Joint Exploration of Hardware Prefetching and Bandwidth Partitioning in Chip Multiprocessors

In this paper, we propose an analytical model-based study to investigate how hardware prefetching and memory bandwidth partitioning impact Chip Multi-Processors (CMP) system performance and how they interact. The model includes a composite prefetching metric that can help determine under which conditions prefetching can improve system performance, a bandwidth partitioning model that takes into ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006